Integrating MAP, marginals, and unsupervised language model adaptation
نویسندگان
چکیده
We investigate the integration of various language model adaptation approaches for a cross-genre adaptation task to improve Mandarin ASR system performance on a recently introduced new genre, broadcast conversation (BC). Various language model adaptation strategies are investigated and their efficacies are evaluated based on ASR performance, including unsupervised language model adaptation from ASR transcripts and ways to integrate supervised Maximum A Posteriori (MAP) and marginal adaptation within the unsupervised adaptation framework. We found that by effectively combining these adaptation approaches, we can achieve as much as 1.3% absolute gain (6% relative) on the final recognition error rate in the BC genre.
منابع مشابه
Unsupervised language model adaptation using latent semantic marginals
We integrated the Latent Dirichlet Allocation (LDA) approach, a latent semantic analysis model, into unsupervised language model adaptation framework. We adapted a background language model by minimizing the Kullback-Leibler divergence between the adapted model and the background model subject to a constraint that the marginalized unigram probability distribution of the adapted model is equal t...
متن کاملLive speech recognition in sports games by adaptation of acoustic model and language model
This paper proposes a method to automatically extract keywords from baseball radio speech through LVCSR for highlight scene retrieval. For robust recognition, we employed acoustic and language model adaptation. In acoustic model adaptation, supervised and unsupervised adaptations were carried out using MLLR+MAP. By this two level adaptation, word accuracy was improved by 28%. In language model ...
متن کاملIntegrating MAP and linear transformation for language model adaptation
This paper discusses the integration of various language model (LM) adaptations. Ways of integrating Maximum A Posteriori (MAP) adaptation and linear transformation of bigram probability vectors are introduced and evaluated. This method leads to little improvements for adaptation corpora of less than 15,000 words. Another method, based on a data augmentation technique by means of a distance bet...
متن کاملMAP adaptation of stochastic grammars
This paper investigates supervised and unsupervised adaptation of stochastic grammars, including ngram language models and probabilistic context-free grammars (PCFGs), to a new domain. It is shown that the commonly used approaches of count merging and model interpolation are special cases of a more general maximum a posteriori (MAP) framework, which additionally allows for alternate adaptation ...
متن کاملExtracting and Selecting Relevant Corpora for Domain Adaptation in MT
The paper presents scheme for doing Domain Adaptation for multiple domains simultaneously. The proposed method segments a large corpus into various parts using self-organizing maps (SOMs). After a SOM is drawn over the documents, an agglomerative clustering algorithm determines how many clusters the text collection comprised. This means that the clustering process is unsupervised, although choi...
متن کامل